[gptq] rebase to main #4695

Xu-Kai · 2023-09-12T08:12:07Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

…ech#4396)

* improve stablility of zero * fix wrong index * add record stream

* style: apply formatter * fix: add outdated warnings * docs: add dataset format and polish * docs: polish README * fix: fix json format * fix: fix typos * revert: revert 7b example

…ech#4430) Co-authored-by: Siyuan Tian <siyuant@vmware.com>

* [cluster] add process group mesh * [test] add process group mesh test * force sync

* [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager

* [pipeline] add p2p communication * [test] add p2p communication test * [test] add rerun decorator * [test] rename to avoid conflict

* [api] update optimizer wrapper to fit pipeline * [pipeline] add base schedule * [pipeline] add 1f1b schedule * [test] add pipeline schedule utils test * [pipeline] fix import

* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt

* [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager

* add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt

…licy (hpcaitech#4161) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining

…pcaitech#4172) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params

…itech#4187) * move bert related pipeline components to shardformer * fix bugs * revision * fix bert model tests * fix bert_lm_head model tests * fix tests * fix tests * done checks * skip bloom

* [shardformer] support lazy init * [shardformer] linear support lazy init * [shardformer] embedding support lazy init * [shardformer] norm support lazy init * [shardformer] fused linear support lazy init * [test] update shardformer test layer * [test] shardformer with lazy init fit ddp * [lazy] hotfix deepcopy of param * [shardformer] fix bert policy and update test * [shardformer] fix bloom policy and update test * [shardformer] fix opt policy and update test * [shardformer] fix t5 policy and update test * [shardformer] fix gpt2 policy and update test * [shardformer] fix llama policy and update test

* add pipeline forward * complete pipeline forward check * fix bert forward without pipeline * fix comments * discard useless line * add todo * clean prints * fix distribute layers

* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt

…line (hpcaitech#4208) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision

* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache

…4224) * * fix typehint & docstring in sharder.py * * update pipeline forward for GPT2Model * * add test for pipeline forward of GPT2Model * * add cache cleaning in gpt2 test * * change assert to raise command

* add forward for GPTLMHeadModel * add test for gpt_lm * arranging get_held_layers method * arrange forward replacement * add forward for GPT2ForTokenClassification * add forward for GPT2ForSequenceClassification * fix test_shard_gpt2.py * add GPT2DoubleHeadsmodel & fix bugs * add id checking in get_shared_params

* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision * add pure pipeline test * finish some bert models * finish all bert models * finish bert tests * fix bugs * fix bugs * fix test pipeline * fix data gen for qa * update the set pipeline forward * shared params * fix bugs

* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache * support all bloom models * add bloom models policies * finish bloom pipeline and tests * add set pipeline * finish bloom

…HybridParallelPlugin (hpcaitech#4624) * Enable policy assignment in HybridPlugin and enable llama policy for llamav2 * Remove Policy from Plugin * revert changes of plugin HybridParallelModule * revert changes in plugin * upgrade transformers * revert transformers version --------- Co-authored-by: flybird11111 <1829166702@qq.com>

) * set optimizer to optional in execute_pipeline * arrange device and mixed precision in booster init * fix execute_pipeline in booster.py

* update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar

* [devops] fix concurrency group * [devops] fix compatibility test * [devops] fix tensornvme install * [devops] fix tensornvme install * [devops] fix colossalai install

hpcaitech#4645) * [shardformer] update shardformer readme [shardformer] update shardformer readme [shardformer] update shardformer readme * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] change dataset * [shardformer] change dataset * [shardformer] fix CI * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix [example] update opt example [example] resolve comments fix fix

…aitech#4671) * [legacy] move communication to legacy (hpcaitech#4640) * [legacy] refactor logger and clean up legacy codes (hpcaitech#4654) * [legacy] make logger independent to gpc * [legacy] make optim independent to registry * [legacy] move test engine to legacy * [legacy] move nn to legacy (hpcaitech#4656) * [legacy] move nn to legacy * [checkpointio] fix save hf config * [test] remove useledd rpc pp test * [legacy] fix nn init * [example] skip tutorial hybriad parallel example * [devops] test doc check * [devops] test doc check

* [shardformer]fix gpt2 test [shardformer]fix gpt2 test [shardformer]fix gpt2 test * fix * [shardformer] add todo * [shardformer] add todo

…nd related kernels for our inference system (hpcaitech#4577) * [infer] Infer/llama demo (hpcaitech#4503) * add * add infer example * finish * finish * stash * fix * [Kernels] add inference token attention kernel (hpcaitech#4505) * add token forward * fix tests * fix comments * add try import triton * add adapted license * add tests check * [Kernels] add necessary kernels (llama & bloom) for attention forward and kv-cache manager (hpcaitech#4485) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * combine codes (hpcaitech#4509) * [feature] add KV cache manager for llama & bloom inference (hpcaitech#4495) * add kv cache memory manager * add stateinfo during inference * format * format * rename file * add kv cache test * revise on BatchInferState * file dir change * [Bug FIx] import llama context ops fix (hpcaitech#4524) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * fix * add ops into init.py * add * [Infer] Add TPInferEngine and fix file path (hpcaitech#4532) * add engine for TP inference * move file path * update path * fix TPInferEngine * remove unused file * add engine test demo * revise TPInferEngine * fix TPInferEngine, add test * fix * Add Inference test for llama (hpcaitech#4508) * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py --------- Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com> * [infer] Add Bloom inference policy and replaced methods (hpcaitech#4512) * add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * Revert "[infer] Add Bloom inference policy and replaced methods (hpcaitech#4512)" (hpcaitech#4552) This reverts commit 17cfa57. * [Doc] Add colossal inference doc (hpcaitech#4549) * create readme * add readme.md * fix typos * [infer] Add Bloom inference policy and replaced methods (hpcaitech#4553) * add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * trivial * Fix Bugs In Llama Model Forward (hpcaitech#4550) * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py * bug fix: fix bugs about infer_state.is_context_stage * remove pollcies * fix: delete unused code * fix: delete unused code * remove unused coda * fix conflict --------- Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com> * [doc] add colossal inference fig (hpcaitech#4554) * create readme * add readme.md * fix typos * upload fig * [NFC] fix docstring for colossal inference (hpcaitech#4555) Fix docstring and comments in kv cache manager and bloom modeling * fix docstring in llama modeling (hpcaitech#4557) * [Infer] check import vllm (hpcaitech#4559) * change import vllm * import apply_rotary_pos_emb * change import location * [DOC] add installation req (hpcaitech#4561) * add installation req * fix * slight change * remove empty * [Feature] rms-norm transfer into inference llama.py (hpcaitech#4563) * add installation req * fix * slight change * remove empty * add rmsnorm polciy * add * clean codes * [infer] Fix tp inference engine (hpcaitech#4564) * fix engine prepare data * add engine test * use bloom for testing * revise on test * revise on test * reset shardformer llama (hpcaitech#4569) * [infer] Fix engine - tensors on different devices (hpcaitech#4570) * fix diff device in engine * [codefactor] Feature/colossal inference (hpcaitech#4579) * code factors * remove * change coding (hpcaitech#4581) * [doc] complete README of colossal inference (hpcaitech#4585) * complete fig * Update README.md * [doc]update readme (hpcaitech#4586) * update readme * Update README.md * bug fix: fix bus in llama and bloom (hpcaitech#4588) * [BUG FIX]Fix test engine in CI and non-vllm kernels llama forward (hpcaitech#4592) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * [Kernel]Rmsnorm fix (hpcaitech#4598) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * add triton rmsnorm * delete vllm kernel flag * [Bug Fix]Fix bugs in llama (hpcaitech#4601) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * bug fix: remove rotary_positions_ids --------- Co-authored-by: cuiqing.li <lixx3527@gmail.com> * [kernel] Add triton layer norm & replace norm for bloom (hpcaitech#4609) * add layernorm for inference * add test for layernorm kernel * add bloom layernorm replacement policy * trivial: path * [Infer] Bug fix rotary embedding in llama (hpcaitech#4608) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * [bench] Add bloom inference benchmark (hpcaitech#4621) * add bloom benchmark * readme - update benchmark res * trivial - uncomment for testing (hpcaitech#4622) * [Infer] add check triton and cuda version for tests (hpcaitech#4627) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * add check triton and cuda * Update sharder.py (hpcaitech#4629) * [Inference] Hot fix some bugs and typos (hpcaitech#4632) * fix * fix test * fix conflicts * [typo]Comments fix (hpcaitech#4633) * fallback * fix commnets * bug fix: fix some bugs in test_llama and test_bloom (hpcaitech#4635) * [Infer] delete benchmark in tests and fix bug for llama and bloom (hpcaitech#4636) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * add check triton and cuda * delete benchmark and fix infer bugs * delete benchmark for tests * delete useless code * delete bechmark function in utils * [Fix] Revise TPInferEngine, inference tests and benchmarks (hpcaitech#4642) * [Fix] revise TPInferEngine methods and inference tests * fix llama/bloom infer benchmarks * fix infer tests * trivial fix: benchmakrs * trivial * trivial: rm print * modify utils filename for infer ops test (hpcaitech#4657) * [Infer] Fix TPInferEngine init & inference tests, benchmarks (hpcaitech#4670) * fix engine funcs * TPInferEngine: receive shard config in init * benchmarks: revise TPInferEngine init * benchmarks: remove pytest decorator * trivial fix * use small model for tests * [NFC] use args for infer benchmarks (hpcaitech#4674) * revise infer default (hpcaitech#4683) * [Fix] optimize/shard model in TPInferEngine init (hpcaitech#4684) * remove using orig model in engine * revise inference tests * trivial: rename --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: Xu Kai <xukai16@foxmail.com> Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com> Co-authored-by: yuehuayingxueluo <867460659@qq.com> Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com> Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>

* update booster_api.md * update booster_checkpoint.md * update booster_plugins.md * move transformers importing inside function * fix Dict typing * fix autodoc bug * small fix

* [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme

* add gptq * refactor code * fix tests * replace auto-gptq * rname inferance/quant * refactor test * add auto-gptq as an option * reset requirements * change assert and check auto-gptq * add import warnings * change test flash attn version * remove example * change requirements of flash_attn * modify tests * [skip ci] change requirements-test

* [skip ci] add cuda kernels * add license * [skip ci] fix max_input_len * format files & change test size * [skip ci]

github-actions · 2023-09-12T09:52:39Z

The code coverage for the changed files is 5%.

Click me to view the complete report

Name                                           Stmts   Miss  Cover
------------------------------------------------------------------
colossalai/gptq/__init__.py                        3      3     0%
colossalai/gptq/cai_gptq/__init__.py              12     12     0%
colossalai/gptq/cai_gptq/cai_quant_linear.py     160    160     0%
colossalai/gptq/cai_gptq/gptq_op.py               24     24     0%
colossalai/gptq/cai_gptq/gptq_triton.py          186    186     0%
colossalai/gptq/gptq_tp.py                        98     98     0%
colossalai/gptq/models/__init__.py                 2      2     0%
colossalai/gptq/models/bloom.py                   13     13     0%
colossalai/gptq/models/llama.py                   14     14     0%
op_builder/gptq.py                                28     28     0%
tests/test_gptq/test_gptq_linear.py              206    169    18%
------------------------------------------------------------------
TOTAL                                            746    709     5%

Fridge003 and others added 30 commits August 10, 2023 15:36

[gemini] fix tensor storage cleaning in state dict collection (hpcait…

6ccecc0

…ech#4396)

[hotfix] fix unsafe async comm in zero (hpcaitech#4404)

d86ddd9

* improve stablility of zero * fix wrong index * add record stream

[doc] update Coati README (hpcaitech#4405)

6d41c3f

* style: apply formatter * fix: add outdated warnings * docs: add dataset format and polish * docs: polish README * fix: fix json format * fix: fix typos * revert: revert 7b example

[doc] fix a typo in examples/tutorial/auto_parallel/README.md (hpcait…

ff83679

…ech#4430) Co-authored-by: Siyuan Tian <siyuant@vmware.com>

[cluster] add process group mesh (hpcaitech#4039)

5e1a9d4

* [cluster] add process group mesh * [test] add process group mesh test * force sync

[pipeline] add stage manager (hpcaitech#4093)

4225442

* [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager

[pipeline] implement p2p communication (hpcaitech#4100)

45fdc9b

* [pipeline] add p2p communication * [test] add p2p communication test * [test] add rerun decorator * [test] rename to avoid conflict

[pipeline] refactor 1f1b schedule (hpcaitech#4115)

f51ce1b

* [api] update optimizer wrapper to fit pipeline * [pipeline] add base schedule * [pipeline] add 1f1b schedule * [test] add pipeline schedule utils test * [pipeline] fix import

[pipeline] add stage manager (hpcaitech#4093)

5c897dd

* [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager

[pipeline] update shardformer policy

59f6f57

[pipeline] update shardformer docstring

b0b8ad2

[test] update shardformer tests

2d6cc07

[test] add shard util tests

5fc60a3

[shardformer] rename policy file name

1ed3f8a

[shardformer] fix type hint

d35bd7d

[pipeline] move bert related pipeline components to shardformer (hpca…

f3bcc29

…itech#4187) * move bert related pipeline components to shardformer * fix bugs * revision * fix bert model tests * fix bert_lm_head model tests * fix tests * fix tests * done checks * skip bloom

[pipeline] Bert pipeline for shardformer and its tests (hpcaitech#4197)

1094e0f

* add pipeline forward * complete pipeline forward check * fix bert forward without pipeline * fix comments * discard useless line * add todo * clean prints * fix distribute layers

[pipeline] add bloom model pipeline (hpcaitech#4210)

37d22f6

* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache

[pipeline] Add Pipeline Forward for GPT2Model Shardformer (hpcaitech#…

208ac8f

…4224) * * fix typehint & docstring in sharder.py * * update pipeline forward for GPT2Model * * add test for pipeline forward of GPT2Model * * add cache cleaning in gpt2 test * * change assert to raise command

[shardformer] fix base policy (hpcaitech#4229)

7e4de52

ver217 and others added 12 commits September 6, 2023 23:41

[release] update version (hpcaitech#4623)

9709b8f

[pipeline] set optimizer to optional in execute_pipeline (hpcaitech#4630

660eed9

) * set optimizer to optional in execute_pipeline * arrange device and mixed precision in booster init * fix execute_pipeline in booster.py

[devops] fix concurrency group and compatibility test (hpcaitech#4665)

a686f9d

* [devops] fix concurrency group * [devops] fix compatibility test * [devops] fix tensornvme install * [devops] fix tensornvme install * [devops] fix colossalai install

[devops] fix concurrency group (hpcaitech#4667)

536397c

[shardformer]fix gpt2 double head (hpcaitech#4663)

eedaa3e

* [shardformer]fix gpt2 test [shardformer]fix gpt2 test [shardformer]fix gpt2 test * fix * [shardformer] add todo * [shardformer] add todo

[doc] Update booster user documents. (hpcaitech#4669)

1d45473

* update booster_api.md * update booster_checkpoint.md * update booster_plugins.md * move transformers importing inside function * fix Dict typing * fix autodoc bug * small fix

[shardformer] update shardformer readme (hpcaitech#4689)

8844691

* [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme

Xu-Kai closed this Sep 12, 2023

Xu-Kai reopened this Sep 12, 2023

ver217 approved these changes Sep 12, 2023

View reviewed changes

Xu-Kai added 7 commits September 12, 2023 17:01

[gptq] faster gptq cuda kernel (hpcaitech#4494)

bdcb1dd

* [skip ci] add cuda kernels * add license * [skip ci] fix max_input_len * format files & change test size * [skip ci]

add gptq tensor parallel

1753bdc

add gptq tp

880ef70

delete print

6b14822

add test gptq check

4b0f7d5

add test auto gptq check

ddb3c54

Xu-Kai force-pushed the gptq_infer branch 2 times, most recently from f29b2d0 to ddb3c54 Compare September 12, 2023 09:08

Xu-Kai enabled auto-merge (rebase) September 12, 2023 09:15

Merge branch 'feature/quant-gptq' into gptq_infer

ee944de

auto-merge was automatically disabled September 12, 2023 09:16
Rebase failed

Xu-Kai merged commit 183231c into hpcaitech:feature/quant-gptq Sep 12, 2023
8 checks passed

ver217 mentioned this pull request Sep 20, 2023

[feature] add gptq for inference #4754

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gptq] rebase to main #4695

[gptq] rebase to main #4695

Xu-Kai commented Sep 12, 2023

github-actions bot commented Sep 12, 2023

[gptq] rebase to main #4695

[gptq] rebase to main #4695

Conversation

Xu-Kai commented Sep 12, 2023

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

github-actions bot commented Sep 12, 2023